Combining labelled and unlabelled data

نویسندگان

Bogdan Gabrys

Lina Petrakieva

چکیده

There has been much interest in applying techniques that incorporate knowledge from unlabelled data into a supervised learning system but less effort has been made to compare the effectiveness of different approaches on real world problems and to analyse the behaviour of the learning system when using different amount of unlabelled data. In this paper an analysis of the performance of supervised methods enforced by unlabelled data and some semisupervised approaches using different ratios of labelled to unlabelled samples is presented. The experimental results show that when supported by unlabelled samples much less labelled data is generally required to build a classifier without compromising the classification performance. If only a very limited amount of labelled data is available the results show high variability and the performance of the final classifier is more dependant on how reliable the labelled data samples are rather than use of additional unlabelled data. Semi-supervised clustering utilising both labelled and unlabelled data have been shown to offer most significant improvements when natural clusters are present in the considered problem.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Pattern recognition using labelled and unlabelled data

This thesis presents the results of a three year investigation into combining labelled and unlabelled data for data classification. In the present world, there are many fields in which the quantity of data available to workers in that field has increased exponentially over the last few years. This has in part been due to improved methods of automatic data capture and in part due to improved ele...

متن کامل

Combining labelled and unlabelled data in the design of pattern classification systems

متن کامل

Intimate Learning: A Novel Approach for Combining Labelled and Unlabelled Data

This paper introduces a new bootstrapping method closely related to co-training and scoped-learning. The method is tested on a Web information extraction task of learning course names from web pages in which we use very few labelled items as seed data (10 web pages) and combine with an unlabelled set (174 web pages). The overall performance improved the precision/recall from 3.11%/0.31% for a b...

متن کامل

Combining Labelled and Unlabelled Data: A Case Study on Fisher Kernels and Transductive Inference for Biological Entity Recognition

We address the problem of using partially labelled data, eg large collections were only little data is annotated, for extracting biological entities. Our approach relies on a combination of probabilistic models, which we use to model the generation of entities and their context, and kernel machines, which implement powerful categorisers based on a similarity measure and some labelled data. This...

متن کامل

A Labelled Graph Based Multiple Classifier System

In general, classifying graphs with labelled nodes (also known as labelled graphs) is a more difficult task than classifying graphs with unlabelled nodes. In this work, we decompose the labelled graphs into unlabelled subgraphs with respect to the labels, and describe these decomposed subgraphs with the travelling matrices. By utilizing the travelling matrices to calculate the dissimilarity for...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2005

Combining labelled and unlabelled data

نویسندگان

چکیده

منابع مشابه

Pattern recognition using labelled and unlabelled data

Combining labelled and unlabelled data in the design of pattern classification systems

Intimate Learning: A Novel Approach for Combining Labelled and Unlabelled Data

Combining Labelled and Unlabelled Data: A Case Study on Fisher Kernels and Transductive Inference for Biological Entity Recognition

A Labelled Graph Based Multiple Classifier System

عنوان ژورنال:

اشتراک گذاری